Names: [Insert Your Names Here]

Lab 13 Data Investigation 3 (Week 1) - Variable Star Database

Lab 13 Contents

  1. Introduction to the Variable Star Database
  2. Exercises
  3. Data Investigation 3 - Week 2 Instructions

There are no new concepts introduced in this lab - it just provides another opportunity to practice concepts introduces in Labs 9 and 11. It is also a shorter lab, to allow some in-class time this week to review your final project proposals.

1. Introduction and Preliminaries

In this lab, you will be exploring a table containing information about stars whose brightness changes as a function of time (so-called "variable stars"). There are many types of variable stars, and it is not critical that you understand the details of how or why a star's brightness varies. This particular set of variable stars are all in the field of view of the Kepler spacecraft mission.

A description of the table is here


In [ ]:
##Load packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import scipy.stats as st
import scipy.optimize as optimization

# these set the pandas defaults so that it will print ALL values, even for very long lists and large dataframes
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

In [ ]:
##Much like the QuaRCS dataset, there are some values in the table that mean "NaN" or "N/A" or "not measured"
##replace all of these with actual python-recognized NaNs
data = pd.read_csv('Kepler_variables.csv')
data = data.replace(99.999,np.nan)
data = data.replace(99.99,np.nan)
data = data.replace("N/A",np.nan)
data

Check out the link at the top of this page, which gives descriptions of all of the columns in the table. Generally, the first 7 columns contain identifying information for the objects (name, location in the sky, etc.), which are not as interesting as they are unrelated to physical properties, and the last 10 columns contain all of the interesting measured quantities for the stars.

2. Exercises

Exercise 1


Using the code from Lab 11 as a reference, write code that will: 1) isolate only those variable star "Types" (this is in the column "Types" with more than 50 entries in the table - you should be left with five types 2) make a scatterplot of two quantities (columns) in the table where the different types each have a different color and symbol, as you did in Lab 11 with planet discovery methods. You can write a function that takes a data frame and column names as input and generates this scatterplot, or you can write code that gives columns generic names (like x and y) and just swap out the names of the columns assigned to those variables as needed. 3) To demonstrate that everything is working, make an example plot where the x axis is period and y axis is the H-K "Color" of the star. You should choose appropriate axis labels, axis limits, axis scalings (linear or log), and legend location to best highlight the data

Once you are satisfied with the plot, save it and in a separate markdown cell, display the saved plot and note three interesting things that you notice about it/questions that it generates for you.


In [ ]:
## Code for truncating the data to only those types with > 50 entries

In [ ]:
#list of symbols and colors for plot

In [ ]:
#code to loop through methods and make plots

plot and questions/observations go here

Exercise 2


Use the code you wrote for Exercise 1 to explore the dataset. Change the "x" and "y" quantities in the scatterplot until you find one pair that for one particular "type" of variable star appears to show a nice linear relationship between the two quantities (make sure any log scales you used to generate your plot for exercise 2 are turned off if you want to see true linear relationsips). Once you've found a linear relationship to investigate:

1) make a dataframe with only this type of star and generate a scatterplot with the appropriate axis labels and ranges.
2) Use the "modeling" notes from several weeks ago as a model to generate a least-squares linear (slope-intercept) fit to the data and overplot it on this same figure and save it.
3) Calculate the chi-squared statistic for goodness of fit of this model.
4) In a separate markdown cell, insert your figure with the data and model, your chi-squared calculation, a description of whether or not you think the fit is "good", and what if any additional information would help you to determine this. If that information is something you know how to find or calculate, do it.

Once you're done, spend a little time thinking about what it might mean that the two quantities you've plotted are linearly related. What do you think it might tell us about the universe? What can you find out about the two quantities and what do you still need to understand in order to judge the relationship? Add a reflection on these questions to the end of your markdown cell


In [ ]:
# code to truncate the dataframe to only the sample you want to look at

In [ ]:
#code to create basic plot

In [ ]:
#code to define function for line fit

In [ ]:
#code to calculate fit

In [ ]:
#code to make plot with data + model fit

In [ ]:
#code to calculate chi-squared statistic

plot and questions/observations go here

3. Data Investigation 3 - Week 2 Instructions

Now that you are familar with the variable star database, you and your partner must come up with a statistical investigation that you would like to complete using this data. It's completely up to you what you choose to investigate, but here are a few broad ideas to guide your thinking:

  • You might choose to isolate a population of variable stars that you noticed in one of the plots and attempt to understand it (descriptive statistics, correlations, etc) and/or compare it to another population
  • You might make a quantitative comparsion of the TYPES of variable stars and connect this to what you can find out about their physical properties
  • You might isolate a region of a plot or a subset of stars with apparent correlations between variables and attempt to fit a model to the relationship between them.
  • You might consider adding a fourth variable to one of the plots you made by sizing the points to represent that variable.

In all cases, I can provide suggestions and guidance, and would be happy to discuss at office hours or by appointment.

Before 5pm next Monday evening (4/24), you must send me a brief e-mail (that you write together, one e-mail per group) describing a plan for how you will approach a question that you have developed. What do you need to know that you don't know already? What kind of plots will you make and what kinds of statistics will you compute? What is your first thought for what your final data representations will look like?


In [1]:
from IPython.core.display import HTML
def css_styling():
    styles = open("../custom.css", "r").read()
    return HTML(styles)
css_styling()


Out[1]: